Distributed suffix trees
نویسنده
چکیده
We present a new variant of the suffix tree called a distributed suffix tree (DST) which allows for much larger databases of strings to be handled efficiently. The method is based on a new linear time construction algorithm for subtrees of a suffix tree. The new data structure tackles the memory bottleneck problem by constructing these subtrees independently and in parallel. It is designed for distributed memory parallel computing environments (e.g. Beowulf clusters). The central advantage is that standard operations of biological importance on suffix trees are shown to be easily translatable to this new data structure. While none of these operations on the DST require inter-process communication, many have optimal expected parallel running times.
منابع مشابه
Distributed and Paged Suffix Trees for Large Genetic Databases
Overview Why do we need distributed and paged suffix trees? What do we mean by a distributed or paged suffix tree? How can we construct one efficiently? What can we do with it?
متن کاملCompact Suffix Trees Resemble PATRICIA Tries: Limiting Distribution of the Depth
Suffix trees are the most frequently used data structures in algorithms on words. In this paper, we consider the depth of a compact suffix tree, also known as the PAT tree, under some simple probabilistic assumptions. For a biased memoryless source, we prove that the limiting distribution for the depth in a PAT tree is the same as the limiting distribution for the depth in a PATRICIA trie, even...
متن کاملA New Parallel Partition Algorithm for Parallel Suffix Tree Construction
The suffix tree is a compacted trie of all suffixes of a given string. It is a fundamental data structure in a wide range of domains such as text processing, data compression, computer vision, computational biology, and so on [1]. Moreover, it can be used for network researches such as web analysis, which has been studied actively [2], [3]. For example, suffix trees have been utilized to effect...
متن کاملFaster Suffix Tree Construction with Missing
We consider suffix tree construction for situations with missing suffix links. Two examples of such situations are suffix trees for parameterized strings and suffix trees for two-dimensional arrays. These trees also have the property that the node degrees may be large. We add a new backpropagation component to McCreight’s algorithm and also give a high probability hashing scheme for large degre...
متن کاملComputing suffix links for suffix trees and arrays
We present a new and simple algorithm to reconstruct suffix links in suffix trees and suffix arrays. The algorithm is based on observations regarding suffix tree construction algorithms. With our algorithm we bring suffix arrays even closer to the ease of use and implementation of suffix trees.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- J. Discrete Algorithms
دوره 3 شماره
صفحات -
تاریخ انتشار 2005